Histogram Sort with Sampling

نویسندگان

  • Vipul Harsh
  • Laxmikant Kale
  • Edgar Solomonik
چکیده

To minimize data movement, state-of-the-art parallel sorting algorithms use sampling and histogramming techniques to partition keys prior to redistribution. Samples enable partitioning to be done using representative subset of the keys, while histogramming enables evaluation and iterative improvement of a given partitioning. We introduce Histogram sort with sampling (HSS), which combines sampling and histogramming techniques to find high-quality partitions with minimal data movement and high practical performance. Compared to the best known algorithm for finding this partitioning, our algorithm requires a factor of Θ(log(p)/log log(p)) less communication than the best known (recently introduced) alternative, and substantially less when compared to standard variants of Sample sort and Histogram sort. We provide a distributed-memory implementation of the proposed algorithm and compare its performance to two existing implementations, and provide a brief application study showing the benefit of the new algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-speed parallel external sorting of data with arbitrary distribution

Many parallel sorting algorithms of (external) disk data have been reported such as NOWsort, SPsort, and hill sort, etc. They all reduce the execution time compared to some known sequential sort; however, they differ in terms of the speed, throughput, and cost-effectiveness. Mostly they deal with data that are uniformly distributed in their value range. If we divide and redistribute data to pro...

متن کامل

Distribution-Insensitive Parallel External Sorting on PC Clusters

There have been many parallel external sorting algorithms reported such as NOW-Sort, SPsort, and hill sort, etc. They are for sorting large-scale data stored in the disk, but they differ in the speed, throughput, and costeffectiveness. Mostly they deal with data that are uniformly distributed in their value range. Few research results have been yet reported for parallel external sort for data w...

متن کامل

Sorting On A Graphics Processing Unit(GPU)

2.1 Graphics Processing Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.2 Sorting Numbers on GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.2.1 SDK Radix Sort Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.2.1.1 Step 1–Sorting tiles ...

متن کامل

Node Histogram vs. Edge Histogram: A Comparison of PMBGAs in Permutation Domains

Previous papers have proposed an algorithm called the edge histogram sampling algorithm (EHBSA) that models the relative relation between two nodes (edge) of permutation strings of a population within the PMBGA framework for permutation domains. This paper proposes another histogram based model we call the node histogram sampling algorithm (NHBSA). The NHBSA models node frequencies at each abso...

متن کامل

MP-sort: Sorting at Scale on Blue Waters – for a Cosmological Simulation

We implement and investigate a parallel sorting algorithm (MP-sort) on Blue Waters. MP-sort sorts distributed array items with non-unique integer keys into a new distributed array. The sorting algorithm belongs to the family of partition sorting algorithms: the target storage space of a parallel computing rank is represented by a histogram bin whose edges are determined by partitioning the inpu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018